West Kazakhstan Region
SUPN: Shallow Universal Polynomial Networks
Morrow, Zachary, Penwarden, Michael, Chen, Brian, Javeed, Aurya, Narayan, Akil, Jakeman, John D.
Deep neural networks (DNNs) and Kolmogorov-Arnold networks (KANs) are popular methods for function approximation due to their flexibility and expressivity. However, they typically require a large number of trainable parameters to produce a suitable approximation. Beyond making the resulting network less transparent, overparameterization creates a large optimization space, likely producing local minima in training that have quite different generalization errors. In this case, network initialization can have an outsize impact on the model's out-of-sample accuracy. For these reasons, we propose shallow universal polynomial networks (SUPNs). These networks replace all but the last hidden layer with a single layer of polynomials with learnable coefficients, leveraging the strengths of DNNs and polynomials to achieve sufficient expressivity with far fewer parameters. We prove that SUPNs converge at the same rate as the best polynomial approximation of the same degree, and we derive explicit formulas for quasi-optimal SUPN parameters. We complement theory with an extensive suite of numerical experiments involving SUPNs, DNNs, KANs, and polynomial projection in one, two, and ten dimensions, consisting of over 13,000 trained models. On the target functions we numerically studied, for a given number of trainable parameters, the approximation error and variability are often lower for SUPNs than for DNNs and KANs by an order of magnitude. In our examples, SUPNs even outperform polynomial projection on non-smooth functions.
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Russia (0.04)
- (3 more...)
- Government > Regional Government > North America Government > United States Government (1.00)
- Energy (0.68)
Priors in Time: Missing Inductive Biases for Language Model Interpretability
Lubana, Ekdeep Singh, Rager, Can, Hindupur, Sai Sumedh R., Costa, Valerie, Tuckute, Greta, Patel, Oam, Murthy, Sonia Krishna, Fel, Thomas, Wurgaft, Daniel, Bigelow, Eric J., Lin, Johnny, Ba, Demba, Wattenberg, Martin, Viegas, Fernanda, Weber, Melanie, Mueller, Aaron
Recovering meaningful concepts from language model activations is a central aim of interpretability. While existing feature extraction methods aim to identify concepts that are independent directions, it is unclear if this assumption can capture the rich temporal structure of language. Specifically, via a Bayesian lens, we demonstrate that Sparse Autoencoders (SAEs) impose priors that assume independence of concepts across time, implying stationarity. Meanwhile, language model representations exhibit rich temporal dynamics, including systematic growth in conceptual dimensionality, context-dependent correlations, and pronounced non-stationarity, in direct conflict with the priors of SAEs. Taking inspiration from computational neuroscience, we introduce a new interpretability objective -- Temporal Feature Analysis -- which possesses a temporal inductive bias to decompose representations at a given time into two parts: a predictable component, which can be inferred from the context, and a residual component, which captures novel information unexplained by the context. Temporal Feature Analyzers correctly parse garden path sentences, identify event boundaries, and more broadly delineate abstract, slow-moving information from novel, fast-moving information, while existing SAEs show significant pitfalls in all the above tasks. Overall, our results underscore the need for inductive biases that match the data in designing robust interpretability tools.
- Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > Kazakhstan > West Kazakhstan Region (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
We thank the reviewers for their kind comments and for their consensus view that our theoretical results on TV modulus
We are also thankful for the reviewers' concrete suggestions on improving the draft, We agree with the reviewers that our proposed estimators are not computationally efficient. We work in the high-temperature regime i.e., We agree with the reviewer that our estimator doesn't recover the true model even in Model Width as relaxed parameter . We thank the reviewer for raising this subtle issue.
Complexity Scaling Laws for Neural Models using Combinatorial Optimization
Weissman, Lowell, Krumdick, Michael, Abbott, A. Lynn
Recent work on neural scaling laws demonstrates that model performance scales predictably with compute budget, model size, and dataset size. In this work, we develop scaling laws based on problem complexity. We analyze two fundamental complexity measures: solution space size and representation space size. Using the Traveling Salesman Problem (TSP) as a case study, we show that combinatorial optimization promotes smooth cost trends, and therefore meaningful scaling laws can be obtained even in the absence of an interpretable loss. We then show that suboptimality grows predictably for fixed-size models when scaling the number of TSP nodes or spatial dimensions, independent of whether the model was trained with reinforcement learning or supervised fine-tuning on a static dataset. We conclude with an analogy to problem complexity scaling in local search, showing that a much simpler gradient descent of the cost landscape produces similar trends.
- North America > United States > Virginia (0.76)
- Asia > Middle East > Jordan (0.04)
- North America > Canada > Alberta (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Representational Difference Explanations
Kondapaneni, Neehar, Mac Aodha, Oisin, Perona, Pietro
We propose a method for discovering and visualizing the differences between two learned representations, enabling more direct and interpretable model comparisons. We validate our method, which we call Representational Differences Explanations (RDX), by using it to compare models with known conceptual differences and demonstrate that it recovers meaningful distinctions where existing explainable AI (XAI) techniques fail. Applied to state-of-the-art models on challenging subsets of the ImageNet and iNaturalist datasets, RDX reveals both insightful representational differences and subtle patterns in the data. Although comparison is a cornerstone of scientific analysis, current tools in machine learning, namely post hoc XAI methods, struggle to support model comparison effectively. Our work addresses this gap by introducing an effective and explainable tool for contrasting model representations.
- Europe > Norway (0.04)
- Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
- Asia > Kazakhstan > West Kazakhstan Region (0.04)
- North America > United States (0.14)
- Asia > Kazakhstan > West Kazakhstan Region (0.04)
- Information Technology (1.00)
- Government (1.00)
- Law (0.92)
- Leisure & Entertainment > Sports > Soccer (0.68)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > Dominican Republic (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (10 more...)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (3 more...)
- Research Report > Experimental Study (0.93)
- Workflow (0.93)
Supplementary Material: T orchSpatial-A Location Encoding Framework and Benchmark for Spatial Representation Learning
Author ordering is determined by coin flip. For what purpose was the dataset created? Was there a specific task in mind? In order to systematically compare the location encoders' performance and their impact on the Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., Who funded the creation of the dataset? Dr. Gengchen Mai acknowledges the Microsoft Research What do the instances that comprise the dataset represent (e.g., documents, photos, people, The instances in all 17 datasets represent images.
- South America (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Colorado > Jefferson County > Golden (0.04)
- (3 more...)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Hong Kong (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Natural Language (0.68)
- Information Technology > Modeling & Simulation (0.67)